In these notebooks, we provide an in-depth example of how the GEM ML framework can be used for segmenting deforested areas using Sentinel-2 imagery as input and the TMF dataset as a reference. The idea is to use a neural network (NN) model for the analysis. Thanks to the flexibility of the GEM ML framework, we can easily substitute the model in the future by adjusting only the configuration file. We will have a look at the following notebooks separately:
Authors: Michael Engel (m.engel@tum.de) and Joana Reuss (joana.reuss@tum.de)
In this notebook, we will train, validate and test the model of choice.
import os
import sys
import platform
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
import time
import natsort
import torch
import torch.multiprocessing as mp
from tensorboardX import SummaryWriter
from tensorboard import notebook
from sentinelhub import SHConfig, BBox, CRS, DataCollection, UtmZoneSplitter, DataCollection
from eolearn.core import FeatureType, EOPatch, MergeEOPatchesTask, MapFeatureTask, MergeFeatureTask, ZipFeatureTask, LoadTask, EONode, EOWorkflow, EOExecutor, OverwritePermission, SaveTask
from eolearn.io import SentinelHubDemTask, ExportToTiffTask, SentinelHubInputTask, SentinelHubEvalscriptTask, get_available_timestamps, ImportFromTiffTask
from eolearn.mask import CloudMaskTask, JoinMasksTask
from eolearn.features.feature_manipulation import SpatialResizeTask
from eolearn.features.utils import ResizeMethod, ResizeLib
import rasterio
import geopandas as gpd
import pandas as pd
from shapely.geometry import Polygon,Point
import folium
from folium import plugins as foliumplugins
from libs.ConfigME import Config, importME
from libs.MergeTDigests import mergeTDigests
from libs.QuantileScaler_eolearn import QuantileScaler_eolearn_tdigest
from libs.Dataset_eolearn import Dataset_eolearn
from libs import AugmentME
from libs import ExecuteME
from tasks.TDigestTask import TDigestTask
from tasks.PickIdxTask import PickIdxTask
from tasks.SaveValidTask import SaveValidTask
from tasks.PyTorchTasks import ModelForwardTask
from utils.rasterio_reproject import rasterio_reproject
from utils.transforms import batchify, predict, mover, Torchify
from utils.parse_time_interval_observations import parse_time_interval_observations
print("Working Directory:",os.getcwd())
print("Environment:",os.environ['CONDA_DEFAULT_ENV'])
print("Executable:",sys.executable)
/home/michael/anaconda3/envs/eolearn_water/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
Incorporating libs! Incorporating tasks! Incorporating utils! Working Directory: /home/michael/Documents/GEM/TUM-Git/eo-learn-examples/GEM-ML/Example_DeforestationDetection Environment: eolearn_water Executable: /home/michael/anaconda3/envs/eolearn_water/bin/python
First, we load our configuration file which provides all information we need throughout the script and linuxify our paths (if you are working on a Windows machine) as the eo-learn filesystem manager does not support backslashes for now.
#%% load configuration file
config = Config.LOAD("config.dill")
#%% linuxify
config.linuxify()
First, we need to get the paths for all samples within our training, validation and testing datasets, respectively.
#%% training samples
paths_train = [os.path.join(config["dir_train"],file).replace("\\","/") for file in os.listdir(config["dir_train"])]
#%% validation samples
paths_validation = [os.path.join(config["dir_validation"],file).replace("\\","/") for file in os.listdir(config["dir_validation"])]
#%% testing samples
paths_test = [os.path.join(config["dir_test"],file).replace("\\","/") for file in os.listdir(config["dir_test"])]
As discussed in the third notebook, we want to apply quantile scaling to our data. We load the scaler, we've already defined in the previous notebook.
Scaler = QuantileScaler_eolearn_tdigest.LOAD(os.path.join(config["dir_results"],config["savename_scaler"]))
Now, we are ready to define our datasets using the Dataset_eolearn!
Remember that PyTorch asks for shape [batch_size x channels x timestamps x height x width].
The QuantileScaler_eolearn_tdigest handles this by setting transform=Torchify(1).
For the reference and the mask, we use the Torchify class provided within the Dataset_eolearn module.
#%% training dataset
dataset_train = Dataset_eolearn(
paths = paths_train,
feature_data = (FeatureType.DATA,"data"),
feature_reference = (FeatureType.MASK_TIMELESS,"reference"),
feature_mask = (FeatureType.MASK_TIMELESS,"mask_reference"),
transform_data = Scaler,
transform_reference = Torchify(1),
transform_mask = Torchify(1),
return_idx = True,
return_path = False,
torchdevice = None,
torchtype_data = torch.FloatTensor,
torchtype_reference = torch.LongTensor,
torchtype_mask = torch.LongTensor,
)
#%% validation dataset
dataset_validation = Dataset_eolearn(
paths = paths_validation,
feature_data = (FeatureType.DATA,"data"),
feature_reference = (FeatureType.MASK_TIMELESS,"reference"),
feature_mask = (FeatureType.MASK_TIMELESS,"mask_reference"),
transform_data = Scaler,
transform_reference = Torchify(1),
transform_mask = Torchify(1),
return_idx = True,
return_path = False,
torchdevice = None,
torchtype_data = torch.FloatTensor,
torchtype_reference = torch.LongTensor,
torchtype_mask = torch.LongTensor,
)
#%% testing dataset
dataset_test = Dataset_eolearn(
paths = paths_test,
feature_data = (FeatureType.DATA,"data"),
feature_reference = (FeatureType.MASK_TIMELESS,"reference"),
feature_mask = (FeatureType.MASK_TIMELESS,"mask_reference"),
transform_data = Scaler,
transform_reference = Torchify(1),
transform_mask = Torchify(1),
return_idx = True,
return_path = False,
torchdevice = None,
torchtype_data = torch.FloatTensor,
torchtype_reference = torch.LongTensor,
torchtype_mask = torch.LongTensor,
)
Let's test our datasets!
sample_train = dataset_train[:config["batch_size"]]
print('Training Data Shape:',sample_train[0].shape)
print('Training Reference Shape:',sample_train[1].shape)
print('Training Mask Shape:',sample_train[2].shape)
print()
sample_validation = dataset_validation[:config["max_batch_size"]]
print('Validation Data Shape:',sample_validation[0].shape)
print('Validation Reference Shape:',sample_validation[1].shape)
print('Validation Mask Shape:',sample_validation[2].shape)
print()
sample_test = dataset_test[:config["max_batch_size"]]
print('Testing Data Shape:',sample_test[0].shape)
print('Testing Reference Shape:',sample_test[1].shape)
print('Testing Mask Shape:',sample_test[2].shape)
print()
Training Data Shape: torch.Size([12, 6, 256, 256]) Training Reference Shape: torch.Size([12, 256, 256]) Training Mask Shape: torch.Size([12, 256, 256]) Validation Data Shape: torch.Size([2, 6, 256, 256]) Validation Reference Shape: torch.Size([2, 256, 256]) Validation Mask Shape: torch.Size([2, 256, 256]) Testing Data Shape: torch.Size([2, 6, 256, 256]) Testing Reference Shape: torch.Size([2, 256, 256]) Testing Mask Shape: torch.Size([2, 256, 256])
Let's define our dataloader for each dataset.
We will double our batch_size for validation and testing as no gradient calculation is needed here.
#%% training dataloader
dataloader_train = torch.utils.data.DataLoader(
dataset = dataset_train,
batch_size = config["batch_size"],
shuffle = True,
sampler = None,
batch_sampler = None,
num_workers = 0 if platform.system()=="Windows" else config["threads"],
collate_fn = None,
pin_memory = False,
drop_last = True,
timeout = 0,
worker_init_fn = None,
multiprocessing_context = None,
generator = None
)
#%% validation dataloader
dataloader_validation = torch.utils.data.DataLoader(
dataset = dataset_validation,
batch_size = config["max_batch_size"]*2,
shuffle = False,
sampler = None,
batch_sampler = None,
num_workers = 0 if platform.system()=="Windows" else config["threads"],
collate_fn = None,
pin_memory = False,
drop_last = True,
timeout = 0,
worker_init_fn = None,
multiprocessing_context = None,
generator = None
)
#%% testing dataloader
dataloader_test = torch.utils.data.DataLoader(
dataset = dataset_test,
batch_size = config["max_batch_size"]*2,
shuffle = False,
sampler = None,
batch_sampler = None,
num_workers = 0 if platform.system()=="Windows" else config["threads"],
collate_fn = None,
pin_memory = False,
drop_last = True,
timeout = 0,
worker_init_fn = None,
multiprocessing_context = None,
generator = None
)
It is time to initialize our model.
To do so, we use the importME method. It allows us to stay flexible regarding the chosen model architecture and easily adapt it in the future.
#%% import model
module_model = importME(config["module_model"])
#%% initialize model
model = module_model(**config["kwargs_model"])
We want to augment the model such that it fits in our training pipeline. We will add the following functionalities:
The benefit of adding these methods becomes clear when thinking of changing the architecture used without intending to change the IO interface of your code.
#%% general IO
AugmentME.augment_IO(model,savekey='save',loadkey='load',mode='torch')
#%% checkpoint saving
AugmentME.augment_checkpoint(model,key='save_checkpoint',mode='torch')
#%% gradient method
AugmentME.augment_gradient(model,key='get_gradient',mode=None)
#%% number of parameters
AugmentME.augment_Ntheta(model,key="get_Ntheta")
True
As a test if the augmenting worked, we want to have a look at the number of parameters.
#%% number of parameters
print("Number of parameters:",model.get_Ntheta())
Number of parameters: 22447636
Before we can start training our model, we have to define a loss function.
We will keep it as flexible as the model itself and use importME.
loss_function = importME(config["module_loss"])(**config["kwargs_loss"])
No optimization without an optimizer! Due to corresponding device issues, we have to send our model to the device before we define our optimizer.
#%% send model to device to avoid device errors
model.to(config["device"])
DeepLabV3Plus(
(encoder): ResNetEncoder(
(conv1): Conv2d(6, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer2): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer3): Sequential(
(0): BasicBlock(
(conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(4): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(5): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer4): Sequential(
(0): BasicBlock(
(conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), dilation=(2, 2), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(decoder): DeepLabV3PlusDecoder(
(aspp): Sequential(
(0): ASPP(
(convs): ModuleList(
(0): Sequential(
(0): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): ASPPSeparableConv(
(0): SeparableConv2d(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(12, 12), dilation=(12, 12), groups=512, bias=False)
(1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(2): ASPPSeparableConv(
(0): SeparableConv2d(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(24, 24), dilation=(24, 24), groups=512, bias=False)
(1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(3): ASPPSeparableConv(
(0): SeparableConv2d(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(36, 36), dilation=(36, 36), groups=512, bias=False)
(1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(4): ASPPPooling(
(0): AdaptiveAvgPool2d(output_size=1)
(1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU()
)
)
(project): Sequential(
(0): Conv2d(1280, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): Dropout(p=0.5, inplace=False)
)
)
(1): SeparableConv2d(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
(1): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU()
)
(up): UpsamplingBilinear2d(scale_factor=4.0, mode=bilinear)
(block1): Sequential(
(0): Conv2d(64, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(block2): Sequential(
(0): SeparableConv2d(
(0): Conv2d(304, 304, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=304, bias=False)
(1): Conv2d(304, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
)
(segmentation_head): SegmentationHead(
(0): Conv2d(256, 4, kernel_size=(1, 1), stride=(1, 1))
(1): UpsamplingBilinear2d(scale_factor=4.0, mode=bilinear)
(2): Activation(
(activation): Identity()
)
)
)
Now, we can define our optimizer with the model parameters already on our chosen device!
optimizer = importME(config["module_optimizer"])(model.parameters(),**config["kwargs_optimizer"])
To assess the performance of our model, we load some metric.
metric = importME(config["module_metric"])
Of course, we would like to track the proceeding of our training procedure. Hence, we define a tensorboard SummaryWriter.
writer = SummaryWriter(config["dir_tensorboard"])
The tensorboard SummaryWriter enables us to do some nice stuff. For example, adding a graph of our model.
writer.add_graph(model, sample_train[0].to(config["device"]))
/home/michael/anaconda3/envs/eolearn_water/lib/python3.8/site-packages/segmentation_models_pytorch/base/model.py:16: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if h % output_stride != 0 or w % output_stride != 0:
Furthermore, we would like to make our experiment reproducible. Hence, we set the seeds such that all random number generation and shuffling is done in a deterministic manner.
#%% reproducibility
np.random.seed(config["seed"])
torch.manual_seed(config["seed"])
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
In case of a premature exit of the training procedure, we insert a resume flag here. It enables the user to start with the chosen checkpoint or automatically choose the most recent one.
#%% resume flag
resume = False
#%% resume case
if resume:
if resume==True:
resume = os.path.join(config["dir_checkpoints"],natsort.natsorted(os.listdir(config["dir_checkpoints"]))[-1])
else:
resume = resume
print(f'Loading Checkpoint {resume}!')
checkpoint = torch.load(resume,map_location=config["device"])
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
loss = checkpoint['loss']
bestloss = checkpoint['bestloss']
bestmetric = checkpoint["bestmetric"]
epoch_ = checkpoint['epoch']+1
logstep_ = checkpoint['logstep']
else:
epoch_ = 0
logstep_ = 0
bestloss = np.inf
bestmetric = 0 if type(metric) is not list and type(metric) is not np.ndarray else [0 for _ in range(len(metric))]
model.train()
DeepLabV3Plus(
(encoder): ResNetEncoder(
(conv1): Conv2d(6, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer2): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer3): Sequential(
(0): BasicBlock(
(conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(4): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(5): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer4): Sequential(
(0): BasicBlock(
(conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), dilation=(2, 2), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
(conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(decoder): DeepLabV3PlusDecoder(
(aspp): Sequential(
(0): ASPP(
(convs): ModuleList(
(0): Sequential(
(0): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): ASPPSeparableConv(
(0): SeparableConv2d(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(12, 12), dilation=(12, 12), groups=512, bias=False)
(1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(2): ASPPSeparableConv(
(0): SeparableConv2d(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(24, 24), dilation=(24, 24), groups=512, bias=False)
(1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(3): ASPPSeparableConv(
(0): SeparableConv2d(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(36, 36), dilation=(36, 36), groups=512, bias=False)
(1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(4): ASPPPooling(
(0): AdaptiveAvgPool2d(output_size=1)
(1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU()
)
)
(project): Sequential(
(0): Conv2d(1280, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): Dropout(p=0.5, inplace=False)
)
)
(1): SeparableConv2d(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
(1): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU()
)
(up): UpsamplingBilinear2d(scale_factor=4.0, mode=bilinear)
(block1): Sequential(
(0): Conv2d(64, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(block2): Sequential(
(0): SeparableConv2d(
(0): Conv2d(304, 304, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=304, bias=False)
(1): Conv2d(304, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
)
(segmentation_head): SegmentationHead(
(0): Conv2d(256, 4, kernel_size=(1, 1), stride=(1, 1))
(1): UpsamplingBilinear2d(scale_factor=4.0, mode=bilinear)
(2): Activation(
(activation): Identity()
)
)
)
Let's start the training loop!
#%% training loop
print('Start training...')
logstep = -1+logstep_
for epoch in range(config["n_epochs"]-epoch_):
epoch = epoch+epoch_
for step, (x, y, mask, idx) in enumerate(dataloader_train):
print('epoch %i step %i'%(epoch,step))
#%%% clean cache of GPU
torch.cuda.empty_cache()
#%%% compute logstep
logstep = logstep+1
#%%% zero gradients
optimizer.zero_grad(set_to_none=True)
#%%% determine number of minibatches
if type(x)==list:
batchcount = int(np.ceil(len(x[0])/config["max_batch_size"]))
else:
batchcount = int(np.ceil(len(x)/config["max_batch_size"]))
out = []
loss = 0
#%%% minibatch-loop
for p in range(batchcount):
#%%%% determine indices
lowidx = p*config["max_batch_size"]
if p==batchcount-1:
if type(x)==list:
highidx = len(x[0])
else:
highidx = len(x)
else:
highidx = (p+1)*config["max_batch_size"]
if type(x)==list:
tmp_x = [torch.index_select(x_,dim=0,index=torch.arange(lowidx,highidx)).detach() for x_ in x]
else:
tmp_x = torch.index_select(x,dim=0,index=torch.arange(lowidx,highidx)).detach()
tmp_y = torch.index_select(y,dim=0,index=torch.arange(lowidx,highidx)).detach()
tmp_mask = torch.index_select(mask,dim=0,index=torch.arange(lowidx,highidx)).detach()
#%%%% forward pass
if type(tmp_x)==list:
tmp_out = model.forward([item_.to(config["device"]) for item_ in tmp_x])
else:
tmp_out = model.forward(tmp_x.to(config["device"]))
#%%%% compute loss
tmp_loss = loss_function(tmp_out.softmax(1),tmp_y.squeeze(1).to(config["device"]))
tmp_loss = (tmp_loss*tmp_mask.long().squeeze(1).to(config["device"])).sum() / (torch.count_nonzero(tmp_mask.long().to(config["device"])))
#%%%% compute gradient
tmp_loss.backward()
#%%%% collect minibatch output
out.append(tmp_out.detach().cpu())
loss = loss+torch.count_nonzero(tmp_mask.long().detach().cpu())/torch.count_nonzero(mask.long().detach().cpu())*tmp_loss.detach().cpu()
#%%%% free space # keep?
del(tmp_x)
del(tmp_y)
del(tmp_mask)
del(tmp_loss)
del(tmp_out)
#%%% update model parameters
optimizer.step()
#%%% compute metric
out = torch.concat(out,dim=0)
if type(metric)==list:
train_acc = [metric_(out,y.cpu().detach(),mask.cpu().detach()) for metric_ in metric]
else:
train_acc = metric(out,y.cpu().detach(),mask.cpu().detach())
#%%% printing stuff
print(
"[{}] Training Step: {:d}/{:d} {:d}.{:d}, \tbatch_size: {} \tLoss: {:.4f} \tAcc: {}".format(
dt.datetime.now().strftime("%Y-%m-%dT%H-%M-%S"),
logstep+1,
len(dataloader_train)*config["n_epochs"],
epoch,
step,
config["batch_size"],
loss.mean(),
{metric_.__name__:train_acc_ for metric_,train_acc_ in zip(metric,train_acc)} if type(metric)==list else train_acc
)
)
#%%% write to tensorboard
#%%%% log loss
writer.add_scalar(f'LossTraining/{type(loss_function).__name__}', loss, global_step=logstep)
#%%%% log metric
if type(metric)==list:
writer.add_scalars('AccuracyTraining',{metric_.__name__:train_acc_ for metric_,train_acc_ in zip(metric,train_acc)},global_step=logstep)
else:
writer.add_scalar('AccuracyTraining', train_acc, global_step=logstep)
#%%%% gradients
writer.add_histogram('GradientsTraining/AllParams', model.get_gradient(mode='vec',index=None), global_step=logstep, bins=50, walltime=None, max_bins=100)
for name,grad in model.get_gradient(mode='named params',device="cpu",detach=True):
writer.add_histogram(f'NamedGradientsTraining/{name}', grad, global_step=logstep, bins=50, walltime=None, max_bins=100)
#%%% intermediate evaluation of validation set
if config["eval_freq"] and (epoch+1)%config["eval_freq"]==0:
print()
model.eval()
loss_val = []
acc_val = []
weights_val = []
with torch.no_grad():
fig, axis = plt.subplots(nrows=len(dataloader_validation)*2, ncols=dataloader_validation.batch_size, figsize=(3*dataloader_validation.batch_size,2*3*len(dataloader_validation)))
fig.suptitle('Validation Data %i'%logstep)
for step_validation, (x_validation, y_validation, mask_validation, idx_validation) in enumerate(dataloader_validation):
print('validation step %i'%(step_validation))
#%%%% clean cache of GPU
torch.cuda.empty_cache()
#%%%% forward pass
if type(x)==list:
out_validation = model.forward([item_.to(config["device"]) for item_ in x_validation])
else:
out_validation = model.forward(x_validation.to(config["device"]))
#%%%% compute loss
loss_validation = loss_function(out_validation.softmax(1),y_validation.squeeze(1).to(config["device"]))
loss_validation = (loss_validation*mask_validation.long().squeeze(1).to(config["device"])).sum() / (torch.count_nonzero(mask_validation.long().to(config["device"])))
#%%%% compute metric
if type(metric)==list:
validation_acc = [metric_(out_validation.cpu().detach(),y_validation.cpu().detach(),mask_validation.cpu().detach()) for metric_ in metric]
else:
validation_acc = metric(out_validation.cpu().detach(),y_validation.cpu().detach(),mask_validation.cpu().detach())
#%%%% printing stuff
print(
"[{}] Validation Step: {:d}/{:d}, \tbatch_size: {} \tLoss: {:.4f} \tAcc: {}".format(
dt.datetime.now().strftime("%Y-%m-%dT%H-%M-%S"),
step_validation+1,
len(dataloader_validation),
dataloader_validation.batch_size,
loss_validation.mean(),
{metric_.__name__:validation_acc_ for metric_,validation_acc_ in zip(metric,validation_acc)} if type(metric)==list else validation_acc
)
)
#%%%% collect predictions
predictions_validation = torch.argmax(out_validation,1).cpu().detach().numpy()
axis[step_validation*2][0].set_ylabel("Prediction")
axis[step_validation*2+1][0].set_ylabel("Reference")
for i in range(dataloader_validation.batch_size):
axis[step_validation*2][i].imshow(predictions_validation[i].squeeze(),cmap=config["cmap_reference"],vmin=0,vmax=config["num_classes"])
axis[step_validation*2][i].set_xticks([])
axis[step_validation*2][i].set_yticks([])
axis[step_validation*2+1][i].imshow(y_validation.cpu().detach().numpy()[i].squeeze(),cmap=config["cmap_reference"],vmin=0,vmax=config["num_classes"])
axis[step_validation*2+1][i].set_xticks([])
axis[step_validation*2+1][i].set_yticks([])
#%%%% collect loss and accuracy
loss_val.append(loss_validation.cpu().detach().numpy())
acc_val.append(validation_acc)
weights_val.append(torch.count_nonzero(mask_validation).cpu().detach().numpy())
#%%%% total loss and accuracy
total = np.sum([np.sum(weight_) for weight_ in weights_val])
loss_val_total = np.sum([weight_/total*loss_ for weight_,loss_ in zip(weights_val,loss_val)])
if type(metric)==list:
acc_val_total = [np.sum([weight_/total*acc_[i] for weight_,acc_ in zip(weights_val,acc_val)]) for i in range(len(metric))]
else:
acc_val_total = np.sum([weight_/total*acc_ for weight_,acc_ in zip(weights_val,acc_val)])
# print total values
print(
"[{}] Validation: \tTotal Loss: {:.4f} \tTotal Acc: {}".format(
dt.datetime.now().strftime("%Y-%m-%dT%H-%M-%S"),
loss_val_total,
{metric_.__name__:validation_acc_ for metric_,validation_acc_ in zip(metric,acc_val_total)} if type(metric)==list else acc_val_total
)
)
#%%%% write to tensorboard
#%%%%% log loss
writer.add_scalar(f'LossValidation/{type(loss_function).__name__}', loss_val_total, global_step=logstep)
#%%%%% log metric
if type(metric)==list:
writer.add_scalars('AccuracyValidation',{metric_.__name__:validation_acc_ for metric_,validation_acc_ in zip(metric,acc_val_total)},global_step=logstep)
else:
writer.add_scalar('AccuracyValidation', acc_val_total, global_step=logstep)
#%%%%% log figure
plt.tight_layout()
plt.savefig(fname=os.path.join(config["dir_imgs_validation"],"PredictionValidation_%i"%logstep), dpi="figure")
writer.add_figure(tag="PredictionValidation", figure=fig, global_step=logstep, close=True, walltime=None)
model.train()
print()
#%%% checkpoint for best validation loss
if config["checkpoint_bestloss"] and bestloss>loss_val_total:
bestloss = loss_val_total
print("New best validation loss! Storing checkpoint and model!")
model.save_checkpoint(
savename = os.path.join(config["dir_checkpoints"],'checkpoint_bestloss.tar'),
epoch = epoch,
logstep = logstep,
optimizer_state_dict = optimizer.state_dict(),
loss = loss,
bestloss = bestloss,
bestmetric = acc_val_total # reasonable if someone would like to restart training from that checkpoint
)
model.save(savename=os.path.join(config["dir_results"],config["model_savename_inference_bestloss"]),mode='inference')
model.save(savename=os.path.join(config["dir_results"],config["model_savename_bestloss"]),mode='entirely')
#%%% checkpoint for best validation metric(s)
if config["checkpoint_bestmetric"]:
if type(metric)==list:
for m_, (metric_,validation_acc_) in enumerate(zip(metric,acc_val_total)):
if bestmetric[m_]<validation_acc_:
bestmetric[m_] = validation_acc_
print(f"New best validation metric {metric_.__name__}! Storing checkpoint and model!")
model.save_checkpoint(
savename = os.path.join(config["dir_checkpoints"],f'checkpoint_bestmetric_{metric_.__name__}.tar'),
epoch = epoch,
logstep = logstep,
optimizer_state_dict = optimizer.state_dict(),
loss = loss,
bestloss = loss_val_total, # reasonable if someone would like to restart training from that checkpoint
bestmetric = bestmetric
)
model.save(savename=os.path.join(config["dir_results"],config["model_savename_inference_bestmetric"]+f"_{metric_.__name__}"),mode='inference')
model.save(savename=os.path.join(config["dir_results"],config["model_savename_bestmetric"]+f"_{metric_.__name__}"),mode='entirely')
else:
if bestmetric<validation_acc_:
bestmetric = acc_val_total
print(f"New best validation metric! Storing checkpoint and model!")
model.save_checkpoint(
savename = os.path.join(config["dir_checkpoints"],'checkpoint_bestmetric.tar'),
epoch = epoch,
logstep = logstep,
optimizer_state_dict = optimizer.state_dict(),
loss = loss,
bestloss = loss_val_total, # reasonable if someone would like to restart training from that checkpoint
bestmetric = bestmetric
)
model.save(savename=os.path.join(config["dir_results"],config["model_savename_inference_bestmetric"]),mode='inference')
model.save(savename=os.path.join(config["dir_results"],config["model_savename_bestmetric"]),mode='entirely')
#%%% checkpoint
if config["checkpoint_freq"] and (epoch+1)%config["checkpoint_freq"]==0:
model.save_checkpoint(
savename = os.path.join(config["dir_checkpoints"],f'checkpoint_{logstep}_{epoch}_{step}.tar'),
epoch = epoch,
logstep = logstep,
optimizer_state_dict = optimizer.state_dict(),
loss = loss,
bestloss = loss_val_total
)
#%% save model
print('saving final checkpoint!')
model.save_checkpoint(savename=os.path.join(config["dir_checkpoints"],f'checkpoint_{logstep}_{epoch}_{step}.tar'), epoch=epoch, logstep=logstep, optimizer_state_dict=optimizer.state_dict(), loss=loss)
print('saving inference model')
model.save(savename=os.path.join(config["dir_results"],config["model_savename_inference"]),mode='inference')
print('saving entire model')
model.save(savename=os.path.join(config["dir_results"],config["model_savename"]),mode='entirely')
Start training...
epoch 0 step 0
[2023-02-13T15-29-50] Training Step: 1/192 0.0, batch_size: 12 Loss: 1.3909 Acc: {'accuracy': tensor(0.2314), 'cohen_kappa': -0.05557622815866159}
epoch 0 step 1
[2023-02-13T15-29-52] Training Step: 2/192 0.1, batch_size: 12 Loss: 1.2519 Acc: {'accuracy': tensor(0.5437), 'cohen_kappa': 0.2546997744812055}
epoch 0 step 2
[2023-02-13T15-29-54] Training Step: 3/192 0.2, batch_size: 12 Loss: 1.3069 Acc: {'accuracy': tensor(0.4159), 'cohen_kappa': 0.08775329156226808}
epoch 1 step 0
[2023-02-13T15-29-56] Training Step: 4/192 1.0, batch_size: 12 Loss: 1.2904 Acc: {'accuracy': tensor(0.4400), 'cohen_kappa': 0.12432958128214222}
epoch 1 step 1
[2023-02-13T15-29-58] Training Step: 5/192 1.1, batch_size: 12 Loss: 1.1497 Acc: {'accuracy': tensor(0.5859), 'cohen_kappa': 0.3175880981314283}
epoch 1 step 2
[2023-02-13T15-29-59] Training Step: 6/192 1.2, batch_size: 12 Loss: 1.0248 Acc: {'accuracy': tensor(0.7305), 'cohen_kappa': 0.48029103065277545}
validation step 0
[2023-02-13T15-30-01] Validation Step: 1/2, batch_size: 4 Loss: 1.1770 Acc: {'accuracy': tensor(0.5667), 'cohen_kappa': 0.07761842285558429}
validation step 1
[2023-02-13T15-30-01] Validation Step: 2/2, batch_size: 4 Loss: 1.6286 Acc: {'accuracy': tensor(0.1150), 'cohen_kappa': 0.030238133005970225}
[2023-02-13T15-30-01] Validation: Total Loss: 1.3882 Total Acc: {'accuracy': 0.35546315, 'cohen_kappa': 0.05546294690056283}
New best validation loss! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric accuracy! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric cohen_kappa! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
epoch 2 step 0
[2023-02-13T15-30-05] Training Step: 7/192 2.0, batch_size: 12 Loss: 1.2078 Acc: {'accuracy': tensor(0.5255), 'cohen_kappa': 0.26827887659157257}
epoch 2 step 1
[2023-02-13T15-30-07] Training Step: 8/192 2.1, batch_size: 12 Loss: 1.0725 Acc: {'accuracy': tensor(0.6606), 'cohen_kappa': 0.4207766294795455}
epoch 2 step 2
[2023-02-13T15-30-09] Training Step: 9/192 2.2, batch_size: 12 Loss: 1.0924 Acc: {'accuracy': tensor(0.6567), 'cohen_kappa': 0.38959210215618445}
epoch 3 step 0
[2023-02-13T15-30-11] Training Step: 10/192 3.0, batch_size: 12 Loss: 1.0235 Acc: {'accuracy': tensor(0.7188), 'cohen_kappa': 0.5100715106159788}
epoch 3 step 1
[2023-02-13T15-30-12] Training Step: 11/192 3.1, batch_size: 12 Loss: 1.0576 Acc: {'accuracy': tensor(0.6665), 'cohen_kappa': 0.43052683306991246}
epoch 3 step 2
[2023-02-13T15-30-14] Training Step: 12/192 3.2, batch_size: 12 Loss: 1.1070 Acc: {'accuracy': tensor(0.6053), 'cohen_kappa': 0.3970551077101635}
validation step 0
[2023-02-13T15-30-15] Validation Step: 1/2, batch_size: 4 Loss: 1.1561 Acc: {'accuracy': tensor(0.5861), 'cohen_kappa': 0.1837432202180206}
validation step 1
[2023-02-13T15-30-16] Validation Step: 2/2, batch_size: 4 Loss: 1.4820 Acc: {'accuracy': tensor(0.2266), 'cohen_kappa': 0.09576103632142141}
[2023-02-13T15-30-16] Validation: Total Loss: 1.3085 Total Acc: {'accuracy': 0.4180367, 'cohen_kappa': 0.1426019109103221}
New best validation loss! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric accuracy! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric cohen_kappa! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
epoch 4 step 0
[2023-02-13T15-30-20] Training Step: 13/192 4.0, batch_size: 12 Loss: 1.0958 Acc: {'accuracy': tensor(0.6251), 'cohen_kappa': 0.42262457553045585}
epoch 4 step 1
[2023-02-13T15-30-21] Training Step: 14/192 4.1, batch_size: 12 Loss: 1.0808 Acc: {'accuracy': tensor(0.6715), 'cohen_kappa': 0.4779777283634722}
epoch 4 step 2
[2023-02-13T15-30-23] Training Step: 15/192 4.2, batch_size: 12 Loss: 0.9874 Acc: {'accuracy': tensor(0.7719), 'cohen_kappa': 0.6071755504406604}
epoch 5 step 0
[2023-02-13T15-30-25] Training Step: 16/192 5.0, batch_size: 12 Loss: 1.0864 Acc: {'accuracy': tensor(0.6887), 'cohen_kappa': 0.4960389821937736}
epoch 5 step 1
[2023-02-13T15-30-26] Training Step: 17/192 5.1, batch_size: 12 Loss: 1.0566 Acc: {'accuracy': tensor(0.7050), 'cohen_kappa': 0.5257463438889118}
epoch 5 step 2
[2023-02-13T15-30-28] Training Step: 18/192 5.2, batch_size: 12 Loss: 1.0404 Acc: {'accuracy': tensor(0.7006), 'cohen_kappa': 0.5064178498015568}
validation step 0
[2023-02-13T15-30-29] Validation Step: 1/2, batch_size: 4 Loss: 1.1088 Acc: {'accuracy': tensor(0.6374), 'cohen_kappa': 0.39409090465180574}
validation step 1
[2023-02-13T15-30-29] Validation Step: 2/2, batch_size: 4 Loss: 1.1728 Acc: {'accuracy': tensor(0.5797), 'cohen_kappa': 0.372917123934162}
[2023-02-13T15-30-29] Validation: Total Loss: 1.1387 Total Acc: {'accuracy': 0.6103928, 'cohen_kappa': 0.3841898426056335}
New best validation loss! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric accuracy! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric cohen_kappa! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
epoch 6 step 0
[2023-02-13T15-30-34] Training Step: 19/192 6.0, batch_size: 12 Loss: 1.0333 Acc: {'accuracy': tensor(0.7082), 'cohen_kappa': 0.5439953877877608}
epoch 6 step 1
[2023-02-13T15-30-35] Training Step: 20/192 6.1, batch_size: 12 Loss: 0.9699 Acc: {'accuracy': tensor(0.7846), 'cohen_kappa': 0.6457816099441178}
epoch 6 step 2
[2023-02-13T15-30-37] Training Step: 21/192 6.2, batch_size: 12 Loss: 1.0638 Acc: {'accuracy': tensor(0.6825), 'cohen_kappa': 0.4938928627031214}
epoch 7 step 0
[2023-02-13T15-30-39] Training Step: 22/192 7.0, batch_size: 12 Loss: 0.9985 Acc: {'accuracy': tensor(0.7467), 'cohen_kappa': 0.5796006905976263}
epoch 7 step 1
[2023-02-13T15-30-40] Training Step: 23/192 7.1, batch_size: 12 Loss: 1.0097 Acc: {'accuracy': tensor(0.7341), 'cohen_kappa': 0.5894811382773071}
epoch 7 step 2
[2023-02-13T15-30-42] Training Step: 24/192 7.2, batch_size: 12 Loss: 0.9810 Acc: {'accuracy': tensor(0.7683), 'cohen_kappa': 0.6354116920304537}
validation step 0
[2023-02-13T15-30-44] Validation Step: 1/2, batch_size: 4 Loss: 1.2740 Acc: {'accuracy': tensor(0.4519), 'cohen_kappa': 0.14831940364377505}
validation step 1
[2023-02-13T15-30-44] Validation Step: 2/2, batch_size: 4 Loss: 1.0452 Acc: {'accuracy': tensor(0.6973), 'cohen_kappa': 0.4798744850683122}
[2023-02-13T15-30-44] Validation: Total Loss: 1.1670 Total Acc: {'accuracy': 0.5666326, 'cohen_kappa': 0.30335772564606966}
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
epoch 8 step 0
[2023-02-13T15-30-46] Training Step: 25/192 8.0, batch_size: 12 Loss: 1.0354 Acc: {'accuracy': tensor(0.7091), 'cohen_kappa': 0.5514917751169328}
epoch 8 step 1
[2023-02-13T15-30-47] Training Step: 26/192 8.1, batch_size: 12 Loss: 0.9314 Acc: {'accuracy': tensor(0.8112), 'cohen_kappa': 0.6780455239697434}
epoch 8 step 2
[2023-02-13T15-30-49] Training Step: 27/192 8.2, batch_size: 12 Loss: 1.0057 Acc: {'accuracy': tensor(0.7384), 'cohen_kappa': 0.5964450949672223}
epoch 9 step 0
[2023-02-13T15-30-51] Training Step: 28/192 9.0, batch_size: 12 Loss: 1.0137 Acc: {'accuracy': tensor(0.7309), 'cohen_kappa': 0.5903812894133914}
epoch 9 step 1
[2023-02-13T15-30-52] Training Step: 29/192 9.1, batch_size: 12 Loss: 0.9804 Acc: {'accuracy': tensor(0.7617), 'cohen_kappa': 0.6213388239819887}
epoch 9 step 2
[2023-02-13T15-30-54] Training Step: 30/192 9.2, batch_size: 12 Loss: 0.9575 Acc: {'accuracy': tensor(0.7906), 'cohen_kappa': 0.6431456284613131}
validation step 0
[2023-02-13T15-30-55] Validation Step: 1/2, batch_size: 4 Loss: 1.0775 Acc: {'accuracy': tensor(0.6493), 'cohen_kappa': 0.4464086271346316}
validation step 1
[2023-02-13T15-30-55] Validation Step: 2/2, batch_size: 4 Loss: 0.9908 Acc: {'accuracy': tensor(0.7531), 'cohen_kappa': 0.5805579069628093}
[2023-02-13T15-30-55] Validation: Total Loss: 1.0370 Total Acc: {'accuracy': 0.697852, 'cohen_kappa': 0.5091381113368234}
New best validation loss! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric accuracy! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric cohen_kappa! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
epoch 10 step 0
[2023-02-13T15-30-59] Training Step: 31/192 10.0, batch_size: 12 Loss: 1.0150 Acc: {'accuracy': tensor(0.7323), 'cohen_kappa': 0.596329779144826}
epoch 10 step 1
[2023-02-13T15-31-01] Training Step: 32/192 10.1, batch_size: 12 Loss: 0.9456 Acc: {'accuracy': tensor(0.7996), 'cohen_kappa': 0.6690378217850028}
epoch 10 step 2
[2023-02-13T15-31-02] Training Step: 33/192 10.2, batch_size: 12 Loss: 0.9621 Acc: {'accuracy': tensor(0.7809), 'cohen_kappa': 0.6335743324243279}
epoch 11 step 0
[2023-02-13T15-31-04] Training Step: 34/192 11.0, batch_size: 12 Loss: 0.9578 Acc: {'accuracy': tensor(0.7878), 'cohen_kappa': 0.6794396182586451}
epoch 11 step 1
[2023-02-13T15-31-06] Training Step: 35/192 11.1, batch_size: 12 Loss: 0.9286 Acc: {'accuracy': tensor(0.8176), 'cohen_kappa': 0.6746903156680578}
epoch 11 step 2
[2023-02-13T15-31-08] Training Step: 36/192 11.2, batch_size: 12 Loss: 0.9499 Acc: {'accuracy': tensor(0.7900), 'cohen_kappa': 0.6704500809843459}
validation step 0
[2023-02-13T15-31-09] Validation Step: 1/2, batch_size: 4 Loss: 1.1176 Acc: {'accuracy': tensor(0.6073), 'cohen_kappa': 0.37291330048210924}
validation step 1
[2023-02-13T15-31-09] Validation Step: 2/2, batch_size: 4 Loss: 0.9379 Acc: {'accuracy': tensor(0.8066), 'cohen_kappa': 0.627908830706918}
[2023-02-13T15-31-10] Validation: Total Loss: 1.0336 Total Acc: {'accuracy': 0.7005212, 'cohen_kappa': 0.4921516452973367}
New best validation loss! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric accuracy! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
epoch 12 step 0
[2023-02-13T15-31-13] Training Step: 37/192 12.0, batch_size: 12 Loss: 0.9275 Acc: {'accuracy': tensor(0.8184), 'cohen_kappa': 0.7166437784507591}
epoch 12 step 1
[2023-02-13T15-31-14] Training Step: 38/192 12.1, batch_size: 12 Loss: 0.9830 Acc: {'accuracy': tensor(0.7590), 'cohen_kappa': 0.6126842877945431}
epoch 12 step 2
[2023-02-13T15-31-16] Training Step: 39/192 12.2, batch_size: 12 Loss: 0.8911 Acc: {'accuracy': tensor(0.8532), 'cohen_kappa': 0.7614304053733756}
epoch 13 step 0
[2023-02-13T15-31-18] Training Step: 40/192 13.0, batch_size: 12 Loss: 0.9095 Acc: {'accuracy': tensor(0.8350), 'cohen_kappa': 0.7506030131123372}
epoch 13 step 1
[2023-02-13T15-31-20] Training Step: 41/192 13.1, batch_size: 12 Loss: 0.9926 Acc: {'accuracy': tensor(0.7501), 'cohen_kappa': 0.6103423382645203}
epoch 13 step 2
[2023-02-13T15-31-22] Training Step: 42/192 13.2, batch_size: 12 Loss: 0.9717 Acc: {'accuracy': tensor(0.7708), 'cohen_kappa': 0.6309537344687672}
validation step 0
[2023-02-13T15-31-23] Validation Step: 1/2, batch_size: 4 Loss: 1.0275 Acc: {'accuracy': tensor(0.7134), 'cohen_kappa': 0.4990206300740485}
validation step 1
[2023-02-13T15-31-23] Validation Step: 2/2, batch_size: 4 Loss: 1.1336 Acc: {'accuracy': tensor(0.6037), 'cohen_kappa': 0.2560782122857662}
[2023-02-13T15-31-24] Validation: Total Loss: 1.0771 Total Acc: {'accuracy': 0.6620851, 'cohen_kappa': 0.3854184357259929}
epoch 14 step 0
[2023-02-13T15-31-25] Training Step: 43/192 14.0, batch_size: 12 Loss: 0.9303 Acc: {'accuracy': tensor(0.8137), 'cohen_kappa': 0.7002342094296158}
epoch 14 step 1
[2023-02-13T15-31-27] Training Step: 44/192 14.1, batch_size: 12 Loss: 0.9284 Acc: {'accuracy': tensor(0.8178), 'cohen_kappa': 0.6749811491756527}
epoch 14 step 2
[2023-02-13T15-31-28] Training Step: 45/192 14.2, batch_size: 12 Loss: 0.9804 Acc: {'accuracy': tensor(0.7600), 'cohen_kappa': 0.6390093483194795}
epoch 15 step 0
[2023-02-13T15-31-31] Training Step: 46/192 15.0, batch_size: 12 Loss: 0.9810 Acc: {'accuracy': tensor(0.7608), 'cohen_kappa': 0.6011259018941029}
epoch 15 step 1
[2023-02-13T15-31-32] Training Step: 47/192 15.1, batch_size: 12 Loss: 0.9969 Acc: {'accuracy': tensor(0.7458), 'cohen_kappa': 0.5971148325880589}
epoch 15 step 2
[2023-02-13T15-31-34] Training Step: 48/192 15.2, batch_size: 12 Loss: 0.9775 Acc: {'accuracy': tensor(0.7634), 'cohen_kappa': 0.64644965800368}
validation step 0
[2023-02-13T15-31-35] Validation Step: 1/2, batch_size: 4 Loss: 0.9258 Acc: {'accuracy': tensor(0.8217), 'cohen_kappa': 0.6933596697471072}
validation step 1
[2023-02-13T15-31-35] Validation Step: 2/2, batch_size: 4 Loss: 1.0344 Acc: {'accuracy': tensor(0.7032), 'cohen_kappa': 0.3967198615896762}
[2023-02-13T15-31-35] Validation: Total Loss: 0.9766 Total Acc: {'accuracy': 0.76629746, 'cohen_kappa': 0.5546480629208421}
New best validation loss! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric accuracy! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric cohen_kappa! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
epoch 16 step 0
[2023-02-13T15-31-40] Training Step: 49/192 16.0, batch_size: 12 Loss: 0.9072 Acc: {'accuracy': tensor(0.8365), 'cohen_kappa': 0.7411572348413954}
epoch 16 step 1
[2023-02-13T15-31-42] Training Step: 50/192 16.1, batch_size: 12 Loss: 0.9061 Acc: {'accuracy': tensor(0.8384), 'cohen_kappa': 0.7408152261223787}
epoch 16 step 2
[2023-02-13T15-31-43] Training Step: 51/192 16.2, batch_size: 12 Loss: 0.9671 Acc: {'accuracy': tensor(0.7763), 'cohen_kappa': 0.6324787150886226}
epoch 17 step 0
[2023-02-13T15-31-46] Training Step: 52/192 17.0, batch_size: 12 Loss: 0.9449 Acc: {'accuracy': tensor(0.7960), 'cohen_kappa': 0.6890508872487974}
epoch 17 step 1
[2023-02-13T15-31-48] Training Step: 53/192 17.1, batch_size: 12 Loss: 0.9047 Acc: {'accuracy': tensor(0.8385), 'cohen_kappa': 0.7474719785004946}
epoch 17 step 2
[2023-02-13T15-31-49] Training Step: 54/192 17.2, batch_size: 12 Loss: 0.9182 Acc: {'accuracy': tensor(0.8299), 'cohen_kappa': 0.7080361702570148}
validation step 0
[2023-02-13T15-31-51] Validation Step: 1/2, batch_size: 4 Loss: 0.9965 Acc: {'accuracy': tensor(0.7502), 'cohen_kappa': 0.5853288541168569}
validation step 1
[2023-02-13T15-31-51] Validation Step: 2/2, batch_size: 4 Loss: 0.9727 Acc: {'accuracy': tensor(0.7666), 'cohen_kappa': 0.5230621528249373}
[2023-02-13T15-31-51] Validation: Total Loss: 0.9854 Total Acc: {'accuracy': 0.75782984, 'cohen_kappa': 0.5562123500251693}
New best validation metric cohen_kappa! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
epoch 18 step 0
[2023-02-13T15-31-54] Training Step: 55/192 18.0, batch_size: 12 Loss: 0.9477 Acc: {'accuracy': tensor(0.8014), 'cohen_kappa': 0.6619511495632975}
epoch 18 step 1
[2023-02-13T15-31-55] Training Step: 56/192 18.1, batch_size: 12 Loss: 0.9407 Acc: {'accuracy': tensor(0.8045), 'cohen_kappa': 0.7045306709947103}
epoch 18 step 2
[2023-02-13T15-31-57] Training Step: 57/192 18.2, batch_size: 12 Loss: 0.9467 Acc: {'accuracy': tensor(0.7983), 'cohen_kappa': 0.6778089945406558}
epoch 19 step 0
[2023-02-13T15-31-59] Training Step: 58/192 19.0, batch_size: 12 Loss: 0.9264 Acc: {'accuracy': tensor(0.8173), 'cohen_kappa': 0.7264700560914203}
epoch 19 step 1
[2023-02-13T15-32-01] Training Step: 59/192 19.1, batch_size: 12 Loss: 0.8957 Acc: {'accuracy': tensor(0.8464), 'cohen_kappa': 0.7211029659367661}
epoch 19 step 2
[2023-02-13T15-32-03] Training Step: 60/192 19.2, batch_size: 12 Loss: 0.9155 Acc: {'accuracy': tensor(0.8271), 'cohen_kappa': 0.7264010708306672}
validation step 0
[2023-02-13T15-32-04] Validation Step: 1/2, batch_size: 4 Loss: 0.9064 Acc: {'accuracy': tensor(0.8483), 'cohen_kappa': 0.7476466026801813}
validation step 1
[2023-02-13T15-32-04] Validation Step: 2/2, batch_size: 4 Loss: 1.0271 Acc: {'accuracy': tensor(0.7138), 'cohen_kappa': 0.502610188849713}
[2023-02-13T15-32-04] Validation: Total Loss: 0.9628 Total Acc: {'accuracy': 0.7854378, 'cohen_kappa': 0.6330652357878078}
New best validation loss! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric accuracy! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric cohen_kappa! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
epoch 20 step 0
[2023-02-13T15-32-09] Training Step: 61/192 20.0, batch_size: 12 Loss: 0.9451 Acc: {'accuracy': tensor(0.7965), 'cohen_kappa': 0.6884061726464226}
epoch 20 step 1
[2023-02-13T15-32-11] Training Step: 62/192 20.1, batch_size: 12 Loss: 0.8826 Acc: {'accuracy': tensor(0.8611), 'cohen_kappa': 0.758370248132815}
epoch 20 step 2
[2023-02-13T15-32-13] Training Step: 63/192 20.2, batch_size: 12 Loss: 0.9447 Acc: {'accuracy': tensor(0.7961), 'cohen_kappa': 0.686921438372986}
epoch 21 step 0
[2023-02-13T15-32-15] Training Step: 64/192 21.0, batch_size: 12 Loss: 0.9008 Acc: {'accuracy': tensor(0.8409), 'cohen_kappa': 0.7508842547707282}
epoch 21 step 1
[2023-02-13T15-32-17] Training Step: 65/192 21.1, batch_size: 12 Loss: 0.9449 Acc: {'accuracy': tensor(0.7965), 'cohen_kappa': 0.6659262541326325}
epoch 21 step 2
[2023-02-13T15-32-19] Training Step: 66/192 21.2, batch_size: 12 Loss: 0.9399 Acc: {'accuracy': tensor(0.8010), 'cohen_kappa': 0.6977305712237294}
validation step 0
[2023-02-13T15-32-20] Validation Step: 1/2, batch_size: 4 Loss: 0.8512 Acc: {'accuracy': tensor(0.8932), 'cohen_kappa': 0.8129253912230852}
validation step 1
[2023-02-13T15-32-20] Validation Step: 2/2, batch_size: 4 Loss: 0.9580 Acc: {'accuracy': tensor(0.7838), 'cohen_kappa': 0.5623318643272017}
[2023-02-13T15-32-20] Validation: Total Loss: 0.9012 Total Acc: {'accuracy': 0.8420067, 'cohen_kappa': 0.695745465197981}
New best validation loss! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric accuracy! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric cohen_kappa! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
epoch 22 step 0
[2023-02-13T15-32-25] Training Step: 67/192 22.0, batch_size: 12 Loss: 0.9054 Acc: {'accuracy': tensor(0.8369), 'cohen_kappa': 0.6924020016666204}
epoch 22 step 1
[2023-02-13T15-32-26] Training Step: 68/192 22.1, batch_size: 12 Loss: 0.9208 Acc: {'accuracy': tensor(0.8210), 'cohen_kappa': 0.7214446435444388}
epoch 22 step 2
[2023-02-13T15-32-28] Training Step: 69/192 22.2, batch_size: 12 Loss: 0.9207 Acc: {'accuracy': tensor(0.8223), 'cohen_kappa': 0.7345901652190887}
epoch 23 step 0
[2023-02-13T15-32-30] Training Step: 70/192 23.0, batch_size: 12 Loss: 0.8815 Acc: {'accuracy': tensor(0.8617), 'cohen_kappa': 0.7653637547300507}
epoch 23 step 1
[2023-02-13T15-32-32] Training Step: 71/192 23.1, batch_size: 12 Loss: 0.9162 Acc: {'accuracy': tensor(0.8262), 'cohen_kappa': 0.7284774865260117}
epoch 23 step 2
[2023-02-13T15-32-34] Training Step: 72/192 23.2, batch_size: 12 Loss: 0.9277 Acc: {'accuracy': tensor(0.8161), 'cohen_kappa': 0.7196271393800199}
validation step 0
[2023-02-13T15-32-35] Validation Step: 1/2, batch_size: 4 Loss: 1.0437 Acc: {'accuracy': tensor(0.6922), 'cohen_kappa': 0.5401980982032146}
validation step 1
[2023-02-13T15-32-35] Validation Step: 2/2, batch_size: 4 Loss: 0.9584 Acc: {'accuracy': tensor(0.7784), 'cohen_kappa': 0.5731854695310097}
[2023-02-13T15-32-35] Validation: Total Loss: 1.0038 Total Acc: {'accuracy': 0.7324784, 'cohen_kappa': 0.5556233080234175}
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
epoch 24 step 0
[2023-02-13T15-32-37] Training Step: 73/192 24.0, batch_size: 12 Loss: 0.9113 Acc: {'accuracy': tensor(0.8306), 'cohen_kappa': 0.7258560516736612}
epoch 24 step 1
[2023-02-13T15-32-39] Training Step: 74/192 24.1, batch_size: 12 Loss: 0.9076 Acc: {'accuracy': tensor(0.8359), 'cohen_kappa': 0.7451256367584911}
epoch 24 step 2
[2023-02-13T15-32-41] Training Step: 75/192 24.2, batch_size: 12 Loss: 0.9105 Acc: {'accuracy': tensor(0.8322), 'cohen_kappa': 0.7330999673571206}
epoch 25 step 0
[2023-02-13T15-32-43] Training Step: 76/192 25.0, batch_size: 12 Loss: 0.9072 Acc: {'accuracy': tensor(0.8354), 'cohen_kappa': 0.7411250430168612}
epoch 25 step 1
[2023-02-13T15-32-45] Training Step: 77/192 25.1, batch_size: 12 Loss: 0.8860 Acc: {'accuracy': tensor(0.8598), 'cohen_kappa': 0.7832929728974216}
epoch 25 step 2
[2023-02-13T15-32-46] Training Step: 78/192 25.2, batch_size: 12 Loss: 0.9418 Acc: {'accuracy': tensor(0.7981), 'cohen_kappa': 0.6709183786183947}
validation step 0
[2023-02-13T15-32-48] Validation Step: 1/2, batch_size: 4 Loss: 0.8908 Acc: {'accuracy': tensor(0.8615), 'cohen_kappa': 0.76976191160364}
validation step 1
[2023-02-13T15-32-48] Validation Step: 2/2, batch_size: 4 Loss: 0.9800 Acc: {'accuracy': tensor(0.7633), 'cohen_kappa': 0.5884229627519881}
[2023-02-13T15-32-48] Validation: Total Loss: 0.9325 Total Acc: {'accuracy': 0.81558216, 'cohen_kappa': 0.6849660875649155}
epoch 26 step 0
[2023-02-13T15-32-50] Training Step: 79/192 26.0, batch_size: 12 Loss: 0.9134 Acc: {'accuracy': tensor(0.8309), 'cohen_kappa': 0.6976810192285832}
epoch 26 step 1
[2023-02-13T15-32-51] Training Step: 80/192 26.1, batch_size: 12 Loss: 0.9072 Acc: {'accuracy': tensor(0.8352), 'cohen_kappa': 0.7490830466060858}
epoch 26 step 2
[2023-02-13T15-32-53] Training Step: 81/192 26.2, batch_size: 12 Loss: 0.9343 Acc: {'accuracy': tensor(0.8086), 'cohen_kappa': 0.700969530020435}
epoch 27 step 0
[2023-02-13T15-32-55] Training Step: 82/192 27.0, batch_size: 12 Loss: 0.9017 Acc: {'accuracy': tensor(0.8407), 'cohen_kappa': 0.732908722088}
epoch 27 step 1
[2023-02-13T15-32-56] Training Step: 83/192 27.1, batch_size: 12 Loss: 0.9279 Acc: {'accuracy': tensor(0.8142), 'cohen_kappa': 0.6916977958206463}
epoch 27 step 2
[2023-02-13T15-32-58] Training Step: 84/192 27.2, batch_size: 12 Loss: 0.9111 Acc: {'accuracy': tensor(0.8303), 'cohen_kappa': 0.7361887330104406}
validation step 0
[2023-02-13T15-32-59] Validation Step: 1/2, batch_size: 4 Loss: 0.9558 Acc: {'accuracy': tensor(0.7890), 'cohen_kappa': 0.6476109860558075}
validation step 1
[2023-02-13T15-33-00] Validation Step: 2/2, batch_size: 4 Loss: 0.9472 Acc: {'accuracy': tensor(0.7940), 'cohen_kappa': 0.5993394109905691}
[2023-02-13T15-33-00] Validation: Total Loss: 0.9518 Total Acc: {'accuracy': 0.7913485, 'cohen_kappa': 0.625038736623323}
epoch 28 step 0
[2023-02-13T15-33-01] Training Step: 85/192 28.0, batch_size: 12 Loss: 0.9064 Acc: {'accuracy': tensor(0.8359), 'cohen_kappa': 0.7375208938637605}
epoch 28 step 1
[2023-02-13T15-33-03] Training Step: 86/192 28.1, batch_size: 12 Loss: 0.8919 Acc: {'accuracy': tensor(0.8511), 'cohen_kappa': 0.7650360091370365}
epoch 28 step 2
[2023-02-13T15-33-05] Training Step: 87/192 28.2, batch_size: 12 Loss: 0.9126 Acc: {'accuracy': tensor(0.8306), 'cohen_kappa': 0.7373209726110648}
epoch 29 step 0
[2023-02-13T15-33-07] Training Step: 88/192 29.0, batch_size: 12 Loss: 0.9338 Acc: {'accuracy': tensor(0.8071), 'cohen_kappa': 0.7067882347720389}
epoch 29 step 1
[2023-02-13T15-33-08] Training Step: 89/192 29.1, batch_size: 12 Loss: 0.8941 Acc: {'accuracy': tensor(0.8497), 'cohen_kappa': 0.7426306433516969}
epoch 29 step 2
[2023-02-13T15-33-10] Training Step: 90/192 29.2, batch_size: 12 Loss: 0.9107 Acc: {'accuracy': tensor(0.8319), 'cohen_kappa': 0.7357869798525072}
validation step 0
[2023-02-13T15-33-11] Validation Step: 1/2, batch_size: 4 Loss: 1.0531 Acc: {'accuracy': tensor(0.6812), 'cohen_kappa': 0.5007139319055034}
validation step 1
[2023-02-13T15-33-11] Validation Step: 2/2, batch_size: 4 Loss: 0.9937 Acc: {'accuracy': tensor(0.7462), 'cohen_kappa': 0.45572799104324213}
[2023-02-13T15-33-12] Validation: Total Loss: 1.0253 Total Acc: {'accuracy': 0.7116252, 'cohen_kappa': 0.4796780763999473}
epoch 30 step 0
[2023-02-13T15-33-13] Training Step: 91/192 30.0, batch_size: 12 Loss: 0.9445 Acc: {'accuracy': tensor(0.7981), 'cohen_kappa': 0.6956995680521212}
epoch 30 step 1
[2023-02-13T15-33-15] Training Step: 92/192 30.1, batch_size: 12 Loss: 0.9210 Acc: {'accuracy': tensor(0.8225), 'cohen_kappa': 0.693554272574971}
epoch 30 step 2
[2023-02-13T15-33-17] Training Step: 93/192 30.2, batch_size: 12 Loss: 0.9031 Acc: {'accuracy': tensor(0.8390), 'cohen_kappa': 0.7265383835484511}
epoch 31 step 0
[2023-02-13T15-33-19] Training Step: 94/192 31.0, batch_size: 12 Loss: 0.8890 Acc: {'accuracy': tensor(0.8536), 'cohen_kappa': 0.7424369072693489}
epoch 31 step 1
[2023-02-13T15-33-21] Training Step: 95/192 31.1, batch_size: 12 Loss: 0.8885 Acc: {'accuracy': tensor(0.8541), 'cohen_kappa': 0.7707602873667729}
epoch 31 step 2
[2023-02-13T15-33-22] Training Step: 96/192 31.2, batch_size: 12 Loss: 0.8931 Acc: {'accuracy': tensor(0.8507), 'cohen_kappa': 0.774084828391289}
validation step 0
[2023-02-13T15-33-24] Validation Step: 1/2, batch_size: 4 Loss: 1.0192 Acc: {'accuracy': tensor(0.7248), 'cohen_kappa': 0.5193695863699821}
validation step 1
[2023-02-13T15-33-24] Validation Step: 2/2, batch_size: 4 Loss: 1.0946 Acc: {'accuracy': tensor(0.6460), 'cohen_kappa': 0.18919767477285143}
[2023-02-13T15-33-24] Validation: Total Loss: 1.0545 Total Acc: {'accuracy': 0.6879435, 'cohen_kappa': 0.3649780477864906}
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
epoch 32 step 0
[2023-02-13T15-33-26] Training Step: 97/192 32.0, batch_size: 12 Loss: 0.8860 Acc: {'accuracy': tensor(0.8569), 'cohen_kappa': 0.7750248957732473}
epoch 32 step 1
[2023-02-13T15-33-28] Training Step: 98/192 32.1, batch_size: 12 Loss: 0.8952 Acc: {'accuracy': tensor(0.8474), 'cohen_kappa': 0.7659463058108047}
epoch 32 step 2
[2023-02-13T15-33-29] Training Step: 99/192 32.2, batch_size: 12 Loss: 0.8883 Acc: {'accuracy': tensor(0.8565), 'cohen_kappa': 0.7678961603625809}
epoch 33 step 0
[2023-02-13T15-33-32] Training Step: 100/192 33.0, batch_size: 12 Loss: 0.8824 Acc: {'accuracy': tensor(0.8597), 'cohen_kappa': 0.7752160878085741}
epoch 33 step 1
[2023-02-13T15-33-34] Training Step: 101/192 33.1, batch_size: 12 Loss: 0.8910 Acc: {'accuracy': tensor(0.8523), 'cohen_kappa': 0.7718504891955391}
epoch 33 step 2
[2023-02-13T15-33-36] Training Step: 102/192 33.2, batch_size: 12 Loss: 0.8860 Acc: {'accuracy': tensor(0.8590), 'cohen_kappa': 0.7670584029364274}
validation step 0
[2023-02-13T15-33-37] Validation Step: 1/2, batch_size: 4 Loss: 0.9607 Acc: {'accuracy': tensor(0.7787), 'cohen_kappa': 0.6529883030801146}
validation step 1
[2023-02-13T15-33-37] Validation Step: 2/2, batch_size: 4 Loss: 0.9406 Acc: {'accuracy': tensor(0.8008), 'cohen_kappa': 0.611371459470869}
[2023-02-13T15-33-37] Validation: Total Loss: 0.9513 Total Acc: {'accuracy': 0.7890247, 'cohen_kappa': 0.6335278696206507}
epoch 34 step 0
[2023-02-13T15-33-39] Training Step: 103/192 34.0, batch_size: 12 Loss: 0.9018 Acc: {'accuracy': tensor(0.8421), 'cohen_kappa': 0.7560377629704754}
epoch 34 step 1
[2023-02-13T15-33-41] Training Step: 104/192 34.1, batch_size: 12 Loss: 0.8629 Acc: {'accuracy': tensor(0.8789), 'cohen_kappa': 0.7865822668886376}
epoch 34 step 2
[2023-02-13T15-33-42] Training Step: 105/192 34.2, batch_size: 12 Loss: 0.8916 Acc: {'accuracy': tensor(0.8503), 'cohen_kappa': 0.7714599988740917}
epoch 35 step 0
[2023-02-13T15-33-44] Training Step: 106/192 35.0, batch_size: 12 Loss: 0.8873 Acc: {'accuracy': tensor(0.8552), 'cohen_kappa': 0.7722990233218369}
epoch 35 step 1
[2023-02-13T15-33-46] Training Step: 107/192 35.1, batch_size: 12 Loss: 0.8835 Acc: {'accuracy': tensor(0.8590), 'cohen_kappa': 0.7836080905996652}
epoch 35 step 2
[2023-02-13T15-33-48] Training Step: 108/192 35.2, batch_size: 12 Loss: 0.9251 Acc: {'accuracy': tensor(0.8186), 'cohen_kappa': 0.711766301636827}
validation step 0
[2023-02-13T15-33-50] Validation Step: 1/2, batch_size: 4 Loss: 1.0035 Acc: {'accuracy': tensor(0.7276), 'cohen_kappa': 0.5365724577805897}
validation step 1
[2023-02-13T15-33-50] Validation Step: 2/2, batch_size: 4 Loss: 1.0449 Acc: {'accuracy': tensor(0.6905), 'cohen_kappa': 0.30756499239970947}
[2023-02-13T15-33-50] Validation: Total Loss: 1.0229 Total Acc: {'accuracy': 0.71026003, 'cohen_kappa': 0.42948638023925884}
epoch 36 step 0
[2023-02-13T15-33-51] Training Step: 109/192 36.0, batch_size: 12 Loss: 0.9108 Acc: {'accuracy': tensor(0.8335), 'cohen_kappa': 0.7143996699382182}
epoch 36 step 1
[2023-02-13T15-33-53] Training Step: 110/192 36.1, batch_size: 12 Loss: 0.8524 Acc: {'accuracy': tensor(0.8917), 'cohen_kappa': 0.8236461660906257}
epoch 36 step 2
[2023-02-13T15-33-55] Training Step: 111/192 36.2, batch_size: 12 Loss: 0.9022 Acc: {'accuracy': tensor(0.8407), 'cohen_kappa': 0.7625193022972058}
epoch 37 step 0
[2023-02-13T15-33-57] Training Step: 112/192 37.0, batch_size: 12 Loss: 0.8693 Acc: {'accuracy': tensor(0.8730), 'cohen_kappa': 0.7984373620851993}
epoch 37 step 1
[2023-02-13T15-33-58] Training Step: 113/192 37.1, batch_size: 12 Loss: 0.9493 Acc: {'accuracy': tensor(0.7913), 'cohen_kappa': 0.6620778111068322}
epoch 37 step 2
[2023-02-13T15-34-00] Training Step: 114/192 37.2, batch_size: 12 Loss: 0.9051 Acc: {'accuracy': tensor(0.8407), 'cohen_kappa': 0.751104493099796}
validation step 0
[2023-02-13T15-34-01] Validation Step: 1/2, batch_size: 4 Loss: 1.0933 Acc: {'accuracy': tensor(0.6189), 'cohen_kappa': 0.3754395555655672}
validation step 1
[2023-02-13T15-34-02] Validation Step: 2/2, batch_size: 4 Loss: 1.0295 Acc: {'accuracy': tensor(0.7030), 'cohen_kappa': 0.3528113216969123}
[2023-02-13T15-34-02] Validation: Total Loss: 1.0635 Total Acc: {'accuracy': 0.6581998, 'cohen_kappa': 0.3648583773378095}
epoch 38 step 0
[2023-02-13T15-34-03] Training Step: 115/192 38.0, batch_size: 12 Loss: 0.9273 Acc: {'accuracy': tensor(0.8152), 'cohen_kappa': 0.7143851608275451}
epoch 38 step 1
[2023-02-13T15-34-05] Training Step: 116/192 38.1, batch_size: 12 Loss: 0.8897 Acc: {'accuracy': tensor(0.8537), 'cohen_kappa': 0.7603776511432832}
epoch 38 step 2
[2023-02-13T15-34-06] Training Step: 117/192 38.2, batch_size: 12 Loss: 0.8907 Acc: {'accuracy': tensor(0.8512), 'cohen_kappa': 0.7731028702869078}
epoch 39 step 0
[2023-02-13T15-34-08] Training Step: 118/192 39.0, batch_size: 12 Loss: 0.8700 Acc: {'accuracy': tensor(0.8723), 'cohen_kappa': 0.7862627600912704}
epoch 39 step 1
[2023-02-13T15-34-10] Training Step: 119/192 39.1, batch_size: 12 Loss: 0.8714 Acc: {'accuracy': tensor(0.8707), 'cohen_kappa': 0.779709779822463}
epoch 39 step 2
[2023-02-13T15-34-11] Training Step: 120/192 39.2, batch_size: 12 Loss: 0.9338 Acc: {'accuracy': tensor(0.8073), 'cohen_kappa': 0.7127169828948599}
validation step 0
[2023-02-13T15-34-13] Validation Step: 1/2, batch_size: 4 Loss: 0.8695 Acc: {'accuracy': tensor(0.8738), 'cohen_kappa': 0.7877554263907438}
validation step 1
[2023-02-13T15-34-13] Validation Step: 2/2, batch_size: 4 Loss: 0.9693 Acc: {'accuracy': tensor(0.7716), 'cohen_kappa': 0.5944937673787141}
[2023-02-13T15-34-13] Validation: Total Loss: 0.9162 Total Acc: {'accuracy': 0.82602197, 'cohen_kappa': 0.6973844292343501}
New best validation metric cohen_kappa! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
epoch 40 step 0
[2023-02-13T15-34-16] Training Step: 121/192 40.0, batch_size: 12 Loss: 0.8768 Acc: {'accuracy': tensor(0.8661), 'cohen_kappa': 0.7977954325209495}
epoch 40 step 1
[2023-02-13T15-34-18] Training Step: 122/192 40.1, batch_size: 12 Loss: 0.8621 Acc: {'accuracy': tensor(0.8808), 'cohen_kappa': 0.8077586552149943}
epoch 40 step 2
[2023-02-13T15-34-20] Training Step: 123/192 40.2, batch_size: 12 Loss: 0.8908 Acc: {'accuracy': tensor(0.8513), 'cohen_kappa': 0.7577873258955465}
epoch 41 step 0
[2023-02-13T15-34-22] Training Step: 124/192 41.0, batch_size: 12 Loss: 0.9027 Acc: {'accuracy': tensor(0.8397), 'cohen_kappa': 0.7608068245950764}
epoch 41 step 1
[2023-02-13T15-34-24] Training Step: 125/192 41.1, batch_size: 12 Loss: 0.9035 Acc: {'accuracy': tensor(0.8406), 'cohen_kappa': 0.729719682281992}
epoch 41 step 2
[2023-02-13T15-34-26] Training Step: 126/192 41.2, batch_size: 12 Loss: 0.8698 Acc: {'accuracy': tensor(0.8738), 'cohen_kappa': 0.7908415931430759}
validation step 0
[2023-02-13T15-34-27] Validation Step: 1/2, batch_size: 4 Loss: 0.8748 Acc: {'accuracy': tensor(0.8692), 'cohen_kappa': 0.7769642285955033}
validation step 1
[2023-02-13T15-34-27] Validation Step: 2/2, batch_size: 4 Loss: 0.9476 Acc: {'accuracy': tensor(0.7931), 'cohen_kappa': 0.6003806067229731}
[2023-02-13T15-34-28] Validation: Total Loss: 0.9088 Total Acc: {'accuracy': 0.83360857, 'cohen_kappa': 0.6943920408595241}
epoch 42 step 0
[2023-02-13T15-34-29] Training Step: 127/192 42.0, batch_size: 12 Loss: 0.8837 Acc: {'accuracy': tensor(0.8583), 'cohen_kappa': 0.7662791148103731}
epoch 42 step 1
[2023-02-13T15-34-31] Training Step: 128/192 42.1, batch_size: 12 Loss: 0.8785 Acc: {'accuracy': tensor(0.8626), 'cohen_kappa': 0.7794225728681328}
epoch 42 step 2
[2023-02-13T15-34-33] Training Step: 129/192 42.2, batch_size: 12 Loss: 0.9231 Acc: {'accuracy': tensor(0.8189), 'cohen_kappa': 0.7212535679174785}
epoch 43 step 0
[2023-02-13T15-34-35] Training Step: 130/192 43.0, batch_size: 12 Loss: 0.8819 Acc: {'accuracy': tensor(0.8601), 'cohen_kappa': 0.7605760051182822}
epoch 43 step 1
[2023-02-13T15-34-37] Training Step: 131/192 43.1, batch_size: 12 Loss: 0.9123 Acc: {'accuracy': tensor(0.8306), 'cohen_kappa': 0.7426007639020975}
epoch 43 step 2
[2023-02-13T15-34-39] Training Step: 132/192 43.2, batch_size: 12 Loss: 0.8940 Acc: {'accuracy': tensor(0.8478), 'cohen_kappa': 0.767346267780769}
validation step 0
[2023-02-13T15-34-41] Validation Step: 1/2, batch_size: 4 Loss: 0.8268 Acc: {'accuracy': tensor(0.9158), 'cohen_kappa': 0.8527111233761231}
validation step 1
[2023-02-13T15-34-41] Validation Step: 2/2, batch_size: 4 Loss: 0.9458 Acc: {'accuracy': tensor(0.7939), 'cohen_kappa': 0.5875196672609027}
[2023-02-13T15-34-41] Validation: Total Loss: 0.8824 Total Acc: {'accuracy': 0.858815, 'cohen_kappa': 0.7287050662564503}
New best validation loss! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric accuracy! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
New best validation metric cohen_kappa! Storing checkpoint and model!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saveME_torch: start saving of the entire model!
saveME_torch: saved
epoch 44 step 0
[2023-02-13T15-34-45] Training Step: 133/192 44.0, batch_size: 12 Loss: 0.8786 Acc: {'accuracy': tensor(0.8639), 'cohen_kappa': 0.7941555302863819}
epoch 44 step 1
[2023-02-13T15-34-47] Training Step: 134/192 44.1, batch_size: 12 Loss: 0.9001 Acc: {'accuracy': tensor(0.8429), 'cohen_kappa': 0.7465306699202661}
epoch 44 step 2
[2023-02-13T15-34-48] Training Step: 135/192 44.2, batch_size: 12 Loss: 0.8749 Acc: {'accuracy': tensor(0.8671), 'cohen_kappa': 0.7718372683147114}
epoch 45 step 0
[2023-02-13T15-34-51] Training Step: 136/192 45.0, batch_size: 12 Loss: 0.9159 Acc: {'accuracy': tensor(0.8248), 'cohen_kappa': 0.7325705799877329}
epoch 45 step 1
[2023-02-13T15-34-52] Training Step: 137/192 45.1, batch_size: 12 Loss: 0.8859 Acc: {'accuracy': tensor(0.8570), 'cohen_kappa': 0.752529734330956}
epoch 45 step 2
[2023-02-13T15-34-54] Training Step: 138/192 45.2, batch_size: 12 Loss: 0.8527 Acc: {'accuracy': tensor(0.8904), 'cohen_kappa': 0.8230836404047003}
validation step 0
[2023-02-13T15-34-55] Validation Step: 1/2, batch_size: 4 Loss: 1.0047 Acc: {'accuracy': tensor(0.7587), 'cohen_kappa': 0.5955601680925017}
validation step 1
[2023-02-13T15-34-56] Validation Step: 2/2, batch_size: 4 Loss: 0.9947 Acc: {'accuracy': tensor(0.7452), 'cohen_kappa': 0.4434123842807809}
[2023-02-13T15-34-56] Validation: Total Loss: 1.0001 Total Acc: {'accuracy': 0.75236475, 'cohen_kappa': 0.5244144115648464}
epoch 46 step 0
[2023-02-13T15-34-57] Training Step: 139/192 46.0, batch_size: 12 Loss: 0.9026 Acc: {'accuracy': tensor(0.8409), 'cohen_kappa': 0.7551895802651266}
epoch 46 step 1
[2023-02-13T15-34-59] Training Step: 140/192 46.1, batch_size: 12 Loss: 0.8764 Acc: {'accuracy': tensor(0.8650), 'cohen_kappa': 0.7838902443764181}
epoch 46 step 2
[2023-02-13T15-35-00] Training Step: 141/192 46.2, batch_size: 12 Loss: 0.8826 Acc: {'accuracy': tensor(0.8599), 'cohen_kappa': 0.7807473826717046}
epoch 47 step 0
[2023-02-13T15-35-03] Training Step: 142/192 47.0, batch_size: 12 Loss: 0.9044 Acc: {'accuracy': tensor(0.8397), 'cohen_kappa': 0.7383520624233151}
epoch 47 step 1
[2023-02-13T15-35-04] Training Step: 143/192 47.1, batch_size: 12 Loss: 0.8909 Acc: {'accuracy': tensor(0.8530), 'cohen_kappa': 0.7766442827616414}
epoch 47 step 2
[2023-02-13T15-35-06] Training Step: 144/192 47.2, batch_size: 12 Loss: 0.8926 Acc: {'accuracy': tensor(0.8495), 'cohen_kappa': 0.7551329272668602}
validation step 0
[2023-02-13T15-35-07] Validation Step: 1/2, batch_size: 4 Loss: 0.8839 Acc: {'accuracy': tensor(0.8577), 'cohen_kappa': 0.7622122320447309}
validation step 1
[2023-02-13T15-35-08] Validation Step: 2/2, batch_size: 4 Loss: 0.9491 Acc: {'accuracy': tensor(0.7909), 'cohen_kappa': 0.5860742081024162}
[2023-02-13T15-35-08] Validation: Total Loss: 0.9144 Total Acc: {'accuracy': 0.8264491, 'cohen_kappa': 0.6798484101558123}
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
epoch 48 step 0
[2023-02-13T15-35-10] Training Step: 145/192 48.0, batch_size: 12 Loss: 0.8765 Acc: {'accuracy': tensor(0.8655), 'cohen_kappa': 0.7837166168258277}
epoch 48 step 1
[2023-02-13T15-35-11] Training Step: 146/192 48.1, batch_size: 12 Loss: 0.8838 Acc: {'accuracy': tensor(0.8583), 'cohen_kappa': 0.7819388447096857}
epoch 48 step 2
[2023-02-13T15-35-13] Training Step: 147/192 48.2, batch_size: 12 Loss: 0.8550 Acc: {'accuracy': tensor(0.8882), 'cohen_kappa': 0.8214588002457232}
epoch 49 step 0
[2023-02-13T15-35-15] Training Step: 148/192 49.0, batch_size: 12 Loss: 0.8899 Acc: {'accuracy': tensor(0.8524), 'cohen_kappa': 0.7721353962862763}
epoch 49 step 1
[2023-02-13T15-35-16] Training Step: 149/192 49.1, batch_size: 12 Loss: 0.8831 Acc: {'accuracy': tensor(0.8606), 'cohen_kappa': 0.7541379782904627}
epoch 49 step 2
[2023-02-13T15-35-18] Training Step: 150/192 49.2, batch_size: 12 Loss: 0.8699 Acc: {'accuracy': tensor(0.8729), 'cohen_kappa': 0.8069924373796857}
validation step 0
[2023-02-13T15-35-20] Validation Step: 1/2, batch_size: 4 Loss: 0.9680 Acc: {'accuracy': tensor(0.7779), 'cohen_kappa': 0.6510732434906241}
validation step 1
[2023-02-13T15-35-20] Validation Step: 2/2, batch_size: 4 Loss: 0.9496 Acc: {'accuracy': tensor(0.7902), 'cohen_kappa': 0.5951325316097944}
[2023-02-13T15-35-20] Validation: Total Loss: 0.9594 Total Acc: {'accuracy': 0.7836454, 'cohen_kappa': 0.6249148324526933}
epoch 50 step 0
[2023-02-13T15-35-22] Training Step: 151/192 50.0, batch_size: 12 Loss: 0.8429 Acc: {'accuracy': tensor(0.9003), 'cohen_kappa': 0.846054296680194}
epoch 50 step 1
[2023-02-13T15-35-23] Training Step: 152/192 50.1, batch_size: 12 Loss: 0.8932 Acc: {'accuracy': tensor(0.8493), 'cohen_kappa': 0.7606021740591531}
epoch 50 step 2
[2023-02-13T15-35-25] Training Step: 153/192 50.2, batch_size: 12 Loss: 0.8840 Acc: {'accuracy': tensor(0.8583), 'cohen_kappa': 0.762017046041138}
epoch 51 step 0
[2023-02-13T15-35-27] Training Step: 154/192 51.0, batch_size: 12 Loss: 0.8556 Acc: {'accuracy': tensor(0.8873), 'cohen_kappa': 0.8224252509709693}
epoch 51 step 1
[2023-02-13T15-35-29] Training Step: 155/192 51.1, batch_size: 12 Loss: 0.8735 Acc: {'accuracy': tensor(0.8682), 'cohen_kappa': 0.7962875640871826}
epoch 51 step 2
[2023-02-13T15-35-30] Training Step: 156/192 51.2, batch_size: 12 Loss: 0.8672 Acc: {'accuracy': tensor(0.8758), 'cohen_kappa': 0.7973093801376726}
validation step 0
[2023-02-13T15-35-32] Validation Step: 1/2, batch_size: 4 Loss: 0.9794 Acc: {'accuracy': tensor(0.7546), 'cohen_kappa': 0.575120201410892}
validation step 1
[2023-02-13T15-35-32] Validation Step: 2/2, batch_size: 4 Loss: 1.0485 Acc: {'accuracy': tensor(0.6862), 'cohen_kappa': 0.30063642132551704}
[2023-02-13T15-35-32] Validation: Total Loss: 1.0117 Total Acc: {'accuracy': 0.72262514, 'cohen_kappa': 0.44676896487293194}
epoch 52 step 0
[2023-02-13T15-35-34] Training Step: 157/192 52.0, batch_size: 12 Loss: 0.8805 Acc: {'accuracy': tensor(0.8666), 'cohen_kappa': 0.7758320755032168}
epoch 52 step 1
[2023-02-13T15-35-36] Training Step: 158/192 52.1, batch_size: 12 Loss: 0.8375 Acc: {'accuracy': tensor(0.9052), 'cohen_kappa': 0.8501358718271644}
epoch 52 step 2
[2023-02-13T15-35-37] Training Step: 159/192 52.2, batch_size: 12 Loss: 0.8923 Acc: {'accuracy': tensor(0.8501), 'cohen_kappa': 0.7689108408884857}
epoch 53 step 0
[2023-02-13T15-35-39] Training Step: 160/192 53.0, batch_size: 12 Loss: 0.8939 Acc: {'accuracy': tensor(0.8516), 'cohen_kappa': 0.7697062683022701}
epoch 53 step 1
[2023-02-13T15-35-41] Training Step: 161/192 53.1, batch_size: 12 Loss: 0.8743 Acc: {'accuracy': tensor(0.8678), 'cohen_kappa': 0.7974120909307408}
epoch 53 step 2
[2023-02-13T15-35-42] Training Step: 162/192 53.2, batch_size: 12 Loss: 0.8500 Acc: {'accuracy': tensor(0.8928), 'cohen_kappa': 0.8180821910266342}
validation step 0
[2023-02-13T15-35-44] Validation Step: 1/2, batch_size: 4 Loss: 0.9246 Acc: {'accuracy': tensor(0.8238), 'cohen_kappa': 0.7092738284286644}
validation step 1
[2023-02-13T15-35-44] Validation Step: 2/2, batch_size: 4 Loss: 0.9587 Acc: {'accuracy': tensor(0.7825), 'cohen_kappa': 0.5499626942061473}
[2023-02-13T15-35-44] Validation: Total Loss: 0.9406 Total Acc: {'accuracy': 0.8044964, 'cohen_kappa': 0.63477842084806}
epoch 54 step 0
[2023-02-13T15-35-46] Training Step: 163/192 54.0, batch_size: 12 Loss: 0.8678 Acc: {'accuracy': tensor(0.8742), 'cohen_kappa': 0.7979842448046471}
epoch 54 step 1
[2023-02-13T15-35-47] Training Step: 164/192 54.1, batch_size: 12 Loss: 0.8547 Acc: {'accuracy': tensor(0.8898), 'cohen_kappa': 0.8335461403333035}
epoch 54 step 2
[2023-02-13T15-35-49] Training Step: 165/192 54.2, batch_size: 12 Loss: 0.8854 Acc: {'accuracy': tensor(0.8572), 'cohen_kappa': 0.7593215894813435}
epoch 55 step 0
[2023-02-13T15-35-51] Training Step: 166/192 55.0, batch_size: 12 Loss: 0.8874 Acc: {'accuracy': tensor(0.8562), 'cohen_kappa': 0.7723535081903738}
epoch 55 step 1
[2023-02-13T15-35-52] Training Step: 167/192 55.1, batch_size: 12 Loss: 0.8683 Acc: {'accuracy': tensor(0.8745), 'cohen_kappa': 0.7938709453507239}
epoch 55 step 2
[2023-02-13T15-35-54] Training Step: 168/192 55.2, batch_size: 12 Loss: 0.8976 Acc: {'accuracy': tensor(0.8451), 'cohen_kappa': 0.7540273620293882}
validation step 0
[2023-02-13T15-35-56] Validation Step: 1/2, batch_size: 4 Loss: 0.9432 Acc: {'accuracy': tensor(0.7957), 'cohen_kappa': 0.6732921400265585}
validation step 1
[2023-02-13T15-35-56] Validation Step: 2/2, batch_size: 4 Loss: 0.9425 Acc: {'accuracy': tensor(0.7983), 'cohen_kappa': 0.605914024397048}
[2023-02-13T15-35-56] Validation: Total Loss: 0.9429 Total Acc: {'accuracy': 0.7969221, 'cohen_kappa': 0.6417854897831254}
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
epoch 56 step 0
[2023-02-13T15-35-58] Training Step: 169/192 56.0, batch_size: 12 Loss: 0.8547 Acc: {'accuracy': tensor(0.8878), 'cohen_kappa': 0.8230902366093013}
epoch 56 step 1
[2023-02-13T15-35-59] Training Step: 170/192 56.1, batch_size: 12 Loss: 0.8772 Acc: {'accuracy': tensor(0.8649), 'cohen_kappa': 0.786548842697036}
epoch 56 step 2
[2023-02-13T15-36-01] Training Step: 171/192 56.2, batch_size: 12 Loss: 0.8509 Acc: {'accuracy': tensor(0.8913), 'cohen_kappa': 0.830077996007112}
epoch 57 step 0
[2023-02-13T15-36-03] Training Step: 172/192 57.0, batch_size: 12 Loss: 0.8654 Acc: {'accuracy': tensor(0.8769), 'cohen_kappa': 0.8009668754259991}
epoch 57 step 1
[2023-02-13T15-36-05] Training Step: 173/192 57.1, batch_size: 12 Loss: 0.8785 Acc: {'accuracy': tensor(0.8634), 'cohen_kappa': 0.77522015666332}
epoch 57 step 2
[2023-02-13T15-36-07] Training Step: 174/192 57.2, batch_size: 12 Loss: 0.8602 Acc: {'accuracy': tensor(0.8817), 'cohen_kappa': 0.8217523507343993}
validation step 0
[2023-02-13T15-36-08] Validation Step: 1/2, batch_size: 4 Loss: 0.8616 Acc: {'accuracy': tensor(0.8823), 'cohen_kappa': 0.7992412217989857}
validation step 1
[2023-02-13T15-36-08] Validation Step: 2/2, batch_size: 4 Loss: 0.9498 Acc: {'accuracy': tensor(0.7905), 'cohen_kappa': 0.564126666085013}
[2023-02-13T15-36-09] Validation: Total Loss: 0.9028 Total Acc: {'accuracy': 0.8393599, 'cohen_kappa': 0.6892994105029768}
epoch 58 step 0
[2023-02-13T15-36-10] Training Step: 175/192 58.0, batch_size: 12 Loss: 0.8533 Acc: {'accuracy': tensor(0.8897), 'cohen_kappa': 0.833603056308877}
epoch 58 step 1
[2023-02-13T15-36-12] Training Step: 176/192 58.1, batch_size: 12 Loss: 0.8648 Acc: {'accuracy': tensor(0.8780), 'cohen_kappa': 0.7985609395052748}
epoch 58 step 2
[2023-02-13T15-36-14] Training Step: 177/192 58.2, batch_size: 12 Loss: 0.8808 Acc: {'accuracy': tensor(0.8610), 'cohen_kappa': 0.7714629032613715}
epoch 59 step 0
[2023-02-13T15-36-16] Training Step: 178/192 59.0, batch_size: 12 Loss: 0.8555 Acc: {'accuracy': tensor(0.8882), 'cohen_kappa': 0.8184691485459297}
epoch 59 step 1
[2023-02-13T15-36-18] Training Step: 179/192 59.1, batch_size: 12 Loss: 0.8860 Acc: {'accuracy': tensor(0.8563), 'cohen_kappa': 0.7682692899886663}
epoch 59 step 2
[2023-02-13T15-36-19] Training Step: 180/192 59.2, batch_size: 12 Loss: 0.8721 Acc: {'accuracy': tensor(0.8706), 'cohen_kappa': 0.8013438561506172}
validation step 0
[2023-02-13T15-36-21] Validation Step: 1/2, batch_size: 4 Loss: 0.8891 Acc: {'accuracy': tensor(0.8545), 'cohen_kappa': 0.7604057187398828}
validation step 1
[2023-02-13T15-36-21] Validation Step: 2/2, batch_size: 4 Loss: 0.9392 Acc: {'accuracy': tensor(0.8025), 'cohen_kappa': 0.6049966526187271}
[2023-02-13T15-36-21] Validation: Total Loss: 0.9126 Total Acc: {'accuracy': 0.8301872, 'cohen_kappa': 0.6877349554632488}
epoch 60 step 0
[2023-02-13T15-36-22] Training Step: 181/192 60.0, batch_size: 12 Loss: 0.8743 Acc: {'accuracy': tensor(0.8678), 'cohen_kappa': 0.7713323152300866}
epoch 60 step 1
[2023-02-13T15-36-24] Training Step: 182/192 60.1, batch_size: 12 Loss: 0.8789 Acc: {'accuracy': tensor(0.8642), 'cohen_kappa': 0.7921916282357739}
epoch 60 step 2
[2023-02-13T15-36-26] Training Step: 183/192 60.2, batch_size: 12 Loss: 0.8409 Acc: {'accuracy': tensor(0.9014), 'cohen_kappa': 0.8403233046193458}
epoch 61 step 0
[2023-02-13T15-36-28] Training Step: 184/192 61.0, batch_size: 12 Loss: 0.8806 Acc: {'accuracy': tensor(0.8610), 'cohen_kappa': 0.7756397329802833}
epoch 61 step 1
[2023-02-13T15-36-29] Training Step: 185/192 61.1, batch_size: 12 Loss: 0.8299 Acc: {'accuracy': tensor(0.9133), 'cohen_kappa': 0.8577537559934313}
epoch 61 step 2
[2023-02-13T15-36-31] Training Step: 186/192 61.2, batch_size: 12 Loss: 0.8698 Acc: {'accuracy': tensor(0.8722), 'cohen_kappa': 0.8040348443916918}
validation step 0
[2023-02-13T15-36-33] Validation Step: 1/2, batch_size: 4 Loss: 0.9866 Acc: {'accuracy': tensor(0.7532), 'cohen_kappa': 0.585845921612177}
validation step 1
[2023-02-13T15-36-33] Validation Step: 2/2, batch_size: 4 Loss: 1.0064 Acc: {'accuracy': tensor(0.7260), 'cohen_kappa': 0.40695890633576015}
[2023-02-13T15-36-33] Validation: Total Loss: 0.9958 Total Acc: {'accuracy': 0.74047786, 'cohen_kappa': 0.5021966451260709}
epoch 62 step 0
[2023-02-13T15-36-34] Training Step: 187/192 62.0, batch_size: 12 Loss: 0.8803 Acc: {'accuracy': tensor(0.8625), 'cohen_kappa': 0.7648468944182463}
epoch 62 step 1
[2023-02-13T15-36-36] Training Step: 188/192 62.1, batch_size: 12 Loss: 0.8553 Acc: {'accuracy': tensor(0.8874), 'cohen_kappa': 0.8104378903968414}
epoch 62 step 2
[2023-02-13T15-36-37] Training Step: 189/192 62.2, batch_size: 12 Loss: 0.8808 Acc: {'accuracy': tensor(0.8607), 'cohen_kappa': 0.787797484498777}
epoch 63 step 0
[2023-02-13T15-36-40] Training Step: 190/192 63.0, batch_size: 12 Loss: 0.8507 Acc: {'accuracy': tensor(0.8917), 'cohen_kappa': 0.8286021139700592}
epoch 63 step 1
[2023-02-13T15-36-41] Training Step: 191/192 63.1, batch_size: 12 Loss: 0.8727 Acc: {'accuracy': tensor(0.8693), 'cohen_kappa': 0.7869636345193906}
epoch 63 step 2
[2023-02-13T15-36-43] Training Step: 192/192 63.2, batch_size: 12 Loss: 0.8672 Acc: {'accuracy': tensor(0.8759), 'cohen_kappa': 0.79844451165435}
validation step 0
[2023-02-13T15-36-44] Validation Step: 1/2, batch_size: 4 Loss: 1.0140 Acc: {'accuracy': tensor(0.7332), 'cohen_kappa': 0.5853699656118463}
validation step 1
[2023-02-13T15-36-44] Validation Step: 2/2, batch_size: 4 Loss: 0.9351 Acc: {'accuracy': tensor(0.8037), 'cohen_kappa': 0.6095376014186584}
[2023-02-13T15-36-44] Validation: Total Loss: 0.9771 Total Acc: {'accuracy': 0.7661687, 'cohen_kappa': 0.5966709828852115}
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saving final checkpoint!
saveME_checkpoint: start saving checkpoint!
saveME_checkpoint: saved
saving inference model
saveME_torch: start saving of the model for later inference only!
saveME_torch: saved
saving entire model
saveME_torch: start saving of the entire model!
saveME_torch: saved
True
After the long training, we would like to test our model on the chosen test tiles.
We load the model providing the best validation loss using the BaseClass from AugmentME.
model = AugmentME.BaseClass(mode="torch")
model.load(os.path.join(config["dir_results"],config["model_savename_bestloss"]))
loadME_torch: start loading of the entire model! loadME_torch: loaded
True
Since the testing script is rather similar to the validation part of our training procedure, we do not discuss this here.
#%% testing loop
print('Start testing...')
model.eval()
losss_test = []
accs_test = []
weights_test = []
with torch.no_grad():
for step_test, (x_test, y_test, mask_test, idx_test) in enumerate(dataloader_test):
print('Test step %i'%(step_test))
#%%%% clean cache of GPU
torch.cuda.empty_cache()
#%%%% forward pass
if type(x_test)==list:
out_test = model.forward([item_.to(config["device"]) for item_ in x_test])
else:
out_test = model.forward(x_test.to(config["device"]))
#%%%% compute loss
loss_test = loss_function(out_test.softmax(1),y_test.squeeze(1).to(config["device"]))
loss_test = (loss_test*mask_test.long().squeeze(1).to(config["device"])).sum() / (torch.count_nonzero(mask_test.long().to(config["device"])))
#%%%% compute metric
if type(metric)==list:
test_acc = [metric_(out_test.cpu().detach(),y_test.cpu().detach(),mask_test.cpu().detach()) for metric_ in metric]
else:
test_acc = metric(out_test.cpu().detach(),y_test.cpu().detach(),mask_test.cpu().detach())
#%%%% printing stuff
print(
"[{}] Test Step: {:d}/{:d}, \tbatch_size: {} \tLoss: {:.4f} \tAcc: {}".format(
dt.datetime.now().strftime("%Y-%m-%dT%H-%M-%S"),
step_test+1,
len(dataloader_test),
dataloader_test.batch_size,
loss_test.mean(),
{metric_.__name__:test_acc_ for metric_,test_acc_ in zip(metric,test_acc)} if type(metric)==list else test_acc
)
)
#%%%% collect loss and accuracy
losss_test.append(loss_test.cpu().detach().numpy())
accs_test.append(test_acc)
weights_test.append(torch.count_nonzero(mask_test).cpu().detach().numpy())
#%%%% plot
#%%%%% calculations for plot
prediction_test = torch.argmax(out_test,1).cpu()
eopatches = [EOPatch.load(dataset_test.paths[idx_.cpu()]) for idx_ in idx_test[0]]
imgs_swir = [eopatch[(FeatureType.DATA,"data")][...,[-1,-3,-4]].squeeze() for eopatch in eopatches]
imgs_true = [eopatch[(FeatureType.DATA,"data")][...,[0,1,2]].squeeze() for eopatch in eopatches]
#%%%%% batch plot
fig, axis = plt.subplots(nrows=4, ncols=dataloader_test.batch_size, figsize=(5*dataloader_test.batch_size,5*4))
axis[0][0].set_ylabel("Prediction")
axis[1][0].set_ylabel("Reference")
axis[2][0].set_ylabel("SWIR Image")
axis[3][0].set_ylabel("True Color Image")
for i in range(dataloader_test.batch_size):
axis[0][i].imshow(prediction_test[i],vmin=0,vmax=config["num_classes"],cmap=config["cmap_reference"])
axis[0][i].set_yticks([])
axis[0][i].set_xticks([])
axis[1][i].imshow(y_test.squeeze(1)[i].cpu(),vmin=0,vmax=config["num_classes"],cmap=config["cmap_reference"])
axis[1][i].set_yticks([])
axis[1][i].set_xticks([])
axis[2][i].imshow(imgs_swir[i]*2.5)
axis[2][i].set_yticks([])
axis[2][i].set_xticks([])
axis[3][i].imshow(imgs_true[i]*2.5)
axis[3][i].set_yticks([])
axis[3][i].set_xticks([])
plt.subplots_adjust(left=0, bottom=0.05, right=1, top=0.95, wspace=0.1, hspace=0)
plt.show()
#%%%% total loss and accuracy
total = np.sum([np.sum(weight_) for weight_ in weights_test])
loss_test_total = np.sum([weight_/total*loss_ for weight_,loss_ in zip(weights_test,losss_test)])
if type(metric)==list:
acc_test_total = [np.sum([weight_/total*acc_[i] for weight_,acc_ in zip(weights_test,accs_test)]) for i in range(len(metric))]
else:
acc_test_total = np.sum([weight_/total*acc_ for weight_,acc_ in zip(weights_test,accs_test)])
# print total values
print(
"[{}] Test: \tTotal Loss: {:.4f} \tTotal Acc: {}".format(
dt.datetime.now().strftime("%Y-%m-%dT%H-%M-%S"),
loss_test_total,
{metric_.__name__:test_acc_ for metric_,test_acc_ in zip(metric,acc_test_total)} if type(metric)==list else acc_test_total
)
)
#%%% write to tensorboard
#%%%% log loss
writer.add_scalar(f'LossTest/{type(loss_function).__name__}', loss_test_total, global_step=step_test)
#%%%% log metric
if type(metric)==list:
writer.add_scalars('AccuracyTest',{metric_.__name__:test_acc_ for metric_,test_acc_ in zip(metric,acc_test_total)},global_step=step_test)
else:
writer.add_scalar('AccuracyTest', acc_test_total, global_step=step_test)
print()
Start testing...
Test step 0
[2023-02-13T15-44-07] Test Step: 1/2, batch_size: 4 Loss: 0.9858 Acc: {'accuracy': tensor(0.7514), 'cohen_kappa': 0.5258390902529031}
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Test step 1
[2023-02-13T15-44-09] Test Step: 2/2, batch_size: 4 Loss: 0.9805 Acc: {'accuracy': tensor(0.7578), 'cohen_kappa': 0.6144480004861541}
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
[2023-02-13T15-44-11] Test: Total Loss: 0.9832 Total Acc: {'accuracy': 0.754561, 'cohen_kappa': 0.5697383328759842}
Our model has been trained and tested. Hence, we free our GPU from it and its corresponding variables.
del(model)
del(optimizer)
del(x)
del(y)
del(mask)
del(x_validation)
del(y_validation)
del(mask_validation)
del(x_test)
del(y_test)
del(mask_test)
del(loss)
del(loss_validation)
del(loss_test)
del(grad)
torch.cuda.empty_cache()
Finally and after a long time doing training, validation and testing, we may have a look at the tensorboard. Please make sure, that the tensorboard is running!
notebook.list()
print('\nPlease check, if the port is correct and tensorboard is running!\n')
notebook.display(port=6006,height=1000)
Known TensorBoard instances: - port 6006: logdir ./Example_DeforestationDetection/DeforestationDetectionRun/results/tensorboard/ (started 0:04:57 ago; pid 117201) Please check, if the port is correct and tensorboard is running! Selecting TensorBoard with logdir ./Example_DeforestationDetection/DeforestationDetectionRun/results/tensorboard/ (started 0:04:57 ago; port 6006, pid 117201).